SemanticScuttle - klotz.me » Tags: retrieval-augmented generation

Tags: retrieval-augmented generation*

0 bookmark(s) - Sort by: Date ↓ / Title /

RAG vs Fine-Tuning Explained: What They Actually Do and When to Use Each

The article clarifies that RAG and fine-tuning are complementary rather than competing techniques for LLM development. RAG works by retrieving external information at inference time, which enables models to access new data and provide citable answers without changing the model weights. In contrast, fine-tuning adjusts a model's internal weights to improve its behavior, such as tone or adherence to specific output formats like JSON.

- RAG provides dynamic knowledge retrieval for accuracy and traceability.
- Fine-tuning improves task performance, style, and formatting consistency.
- Combining both methods allows developers to manage both what a model knows and how it communicates.

2026-07-13 Tags: llm, fine tuning, python, rag, maria mouschoutzi, towardsdatascience by klotz

Assemble Each RAG Generation Prompt from a Base Prompt Plus the Rules Each Question Needs

>"Enterprise Document Intelligence – A fixed BASE, the rules each question needs, one registry: the dispatcher that turns a parsed question into a typed LLM call"

Instead of "mega-prompts," use a Dispatcher Pattern to assemble a `BASE` prompt with specific fragments (shape and constraints) at runtime. This improves accuracy, simplifies maintenance, and aids auditing.

* Modular Prompting: Uses "shape fragments" (formatting/extraction) and "constraint fragments" (specific rules).
* Execution Modes: Combined (sends all chunks at once) vs sequential (Iterative chunk processing to save costs)
* Structural Scoping: Uses query hints (e.g., page numbers) to refine retrieval.
* Best Practices: Use Temperature 0, maintain a 20–30% context window buffer, and log raw model responses.

2026-07-06 Tags: kezhan shi, rag, prompt engineering, document, document parsing, question parsing, retrieval, generation. by klotz

Context Engineering for RAG: The Four Typed Inputs Behind Every RAG Answer

Context engineering shifts RAG focus from prompt tuning to structured data assembly for LLM calls. The single-document architecture utilizes four bricks—parsing, question parsing, retrieval, and generation—to produce typed context pieces. These include system prompts, filtered document segments, and structured metadata. This engineering discipline improves auditability, enables caching, and supports scalable component composition.

- Four-brick pipeline: parsing, question parsing, retrieval, generation
- Typed data outputs for LLM context assembly
- Fixed system prompts for caching efficiency
- Filtered document lines and structured metadata
- Improved auditability and cost control

2026-07-01 Tags: rag, context engineering, llm, document intelligence, architecture by klotz

Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG

This article explores the capacity of Vision Language Models (VLMs) to serve as advanced document parsers. It addresses the limitations of traditional text extraction methods when encountering visual elements like charts, diagrams, and tables within PDFs. By leveraging vision capabilities, these models enable more effective Retrieval-Augmented Generation (RAG) systems by interpreting multimodal content that is typically lost in standard text parsing workflows.
* Limitations of conventional PDF text extraction
* Capabilities of VLMs in understanding visual data structures
* Enhancing RAG pipelines through multimodal document analysis

2026-06-15 Tags: vision language models, vlm, pdf parsing, rag, multimodal ai, data extraction by klotz

Parse PDFs for RAG locally with Docling (Rich Tables, No Cloud Upload)

This article examines Docling, a tool from IBM Research that converts complex PDF documents into structured Markdown or JSON for RAG applications. It offers a local-first approach to ensure data privacy and provides high-fidelity extraction of rich tables and layouts without relying on cloud services.

2026-06-14 Tags: docling, pdf, rag, ibm by klotz

Making Sense of Sensors: Improving LLM Interpretation of Time-Series Data

* **Problem:** LLMs struggle to derive reliable meaning from raw sensor signals, often producing non-actionable or factually incorrect interpretations of time-series data.
* **Methodology:** The study implements a structured RAG-based prompt structure that combines water consumption measurements with descriptive statistics and qualitative user information (such as household water practices).
* **Key Finding:** Augmenting prompts with multidimensional contextual information leads to much higher evaluation scores for grounding, pattern recognition, and actionable recommendations.

2026-06-12 Tags: andres rico, kent larson, mit media lab, llm, time series, rag, mit, hallux by klotz

AxolRAG

This research investigates ways to help large language models interpret time-series sensor data by augmenting measurements with statistical summaries, detected patterns, and environmental context. The study evaluates baseline LLMs, fine-tuned models, and retrieval-augmented generation approaches, finding that combining specialized training with contextual information significantly improves grounding, actionability, and pattern recognition while reducing hallucinations.
* Augmenting time-series data with social and environmental context
* Comparing RAG frameworks against baseline and fine-tuned LLMs
* Enhancing the reliability of automated sensor monitoring systems

2026-06-12 Tags: axolrag, time-series data, large language models, retrieval-augmented generation, sensor interpretation by klotz

Beyond extract_text: The two layers of a PDF that drive RAG quality

This article examines why basic text extraction from PDFs often falls short when building Retrieval Augmented Generation (RAG) pipelines. It highlights how losing visual layout information results in lost semantic context, affecting model accuracy and retrieval performance. The author introduces the concept of two critical layers within a document: the physical layer involving raw character data and coordinates, and the logical layer that constructs meaning through structural elements like headings, tables, and multi-column layouts.
- Why standard text extraction limits RAG performance
- Understanding physical versus logical PDF layers
- The role of layout awareness in preserving semantic context

2026-06-12 Tags: rag, pdf, parsing, document layout analysis, information retrieval, llm by klotz

I gave my local LLM persistent context, and it finally stopped making the same mistakes

Unlike cloud AI services like Claude or Gemini, local LLMs lack built-in workspace features for persistent memory. You can bridge this gap using "context journaling" via system prompts and RAG.

* LM Studio presets for concise system prompts.
* RAG document uploads for background/project history.
* Markdown journal structure (Background, Projects, Corrections).
* “Corrections” section to prevent recurring model errors.
* Session exports for prompt effectiveness records.

2026-05-16 Tags: local llm, lm studio, tools, persistent context, rag, system prompts by klotz

Hybrid Search and Re-Ranking in Production RAG

This article explores techniques for optimizing Retrieval-Augmented Generation (RAG) systems by implementing hybrid search and re-ranking mechanisms. It details how to combine dense vector embeddings with sparse keyword matching, such as BM25, to improve retrieval accuracy, followed by the use of a cross-encoder reranker to ensure only the most relevant context is passed to a Large Language Model in production environments.

2026-05-13 Tags: rag, hybrid search, re-ranking, semantic search, vector database, llm by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: retrieval-augmented generation*

Linked Tags

Related Tags